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Tricks to implement the overlap Dirac operator * 

Herbert Neuberger a ^ 

a Department of Physics and Astronomy, 
Rutgers University, Piscataway, NJ 08855-0849 

I present several tricks to help implement the overlap Dirac operator numerically. 
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1. Introduction 

There are new ways to implement chirality ex- 
actly on the lattice. This theoretical progress 
can be implemented numerically in a myriad of 
ways. I am not sure that it makes sense to have 
all the larger machines, (QCDSP Columbia and 
Riken/BNL, CP-PACS) do domain walls of ex- 
actly the same type. After all, this is only one 
of many possible truncations of the overlap. The 
SCRI and Kentucky groups have been more dar- 
ing and innovative and, I think, their results show 
that it paid off. My purpose in this talk is to 
present a few variations on the topic of direct nu- 
merical implementation of the overlap Dirac op- 
erator Do. 

The plan is to first present the basic procedure 
(U and then proceed to describe five "tricks" . 

2. Basics and refinements 

2.1. Basic procedure 

The objective is: Given a ip compute \ — 
D a ip = (1 + "fhs{Hw)) i>- The basic method uses 
a rational approximation to the sign function Q 
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where0 s = -^{s—j)- Numerically the main point 
is that using the SESAM shifted mass trick the 
cost of computing a 2 cos 2 ^ +sin 2 9s ^ in float- 

ing operations is roughly the same as the cost of 
the single inversion — 9 J~ , . •> „ ib. For the 
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inversion we use the conjugate gradient (CG) al- 
gorithm. Memory usage grows linearly with n. 
En (x) has some important properties: 
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Pick an n such that Sn\x) ~ e{x) for x 6 
[-A, U [\,A], A > 1 and pick x = \H W , 
with the spectrum of \Hw\ bracketed between 
Choose A so that A — AA max = 
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Let k 



be 
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the condition number. For the approximation to 
be good we need 2n >> A = ^/k. The problem 
becomes, as always, that one needs large n if the 
condition number is large. 

2.2. Trick 1 

SCRlH used another rational approximation 

. (2) 

for the sign function, e n (%)■ This approximation 
is optimal in the oo-norm and the coefficients of 
the fraction are computed using the Remez algo- 
rithm. Thus, one achieves better accuracy with a 

(2) 

smaller n. But, |e« (x)| no longer is bounded by 
unity. There is therefore the danger of produc- 
ing unphysical zeros in D Q . The trick I suggest 
is to combine and use Snm{x) — En (em (x)) to 
recover the bound. 



2.3. Trick 2 

Here I am concerned with memory usage, some- 
thing that can affect performance dramatically 
when cache is exceeded. The idea is to use a 
two pass shifted-mass CG. This is similar to a 
standard procedure applied to Lanczos diagonal- 
ization when an eigenvector is also desired. In 
exact arithmetic the algorithm is the same as the 
basic one. The cost in floating point operations 
is at most a factor of 2, but on a RISC processor 
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with standard (high) cache miss penalty one finds 
much smaller costs for practically interesting val- 
ues of n 

2.4. Trick 3 

The main problem in implementations is that 
at desirable gauge couplings one often encoun- 
ters eigenvalues of Hyy very close to zero. But, 
the physics behind the overlap construction al- 
lows to replace the argument of the sign function 
with any reasonable lattice version of the hermi- 
tian Dirac operator in the continuum with a large 
negative mass term. Thus, there is no direct rea- 
son for the argument of the sign function to often 
have a spectrum extending too close to the ori- 
gin. The overlap itself can provide replacements 
of Hw, H' w that are better in this respect. The 
idea is to use a rough approximation to the sign 
function, Erough^) , which is fast to implement 
and take H' w = pj§ + e r0 ugh 

(H w ) < p < 1. 
The choice for p makes the physical Dirac mass 
negative and, if £ roU gh were a good approximation 
to the sign function, H' w would have a gap in its 
spectrum around the origin of size \p — 1| and a 
condition number r^zjj ■ So, the suggestion is to 

(12) 

plug H' w into efim and use either the basic pro- 
cedure or its two pass version. The distinguishing 
feature of this trick is that it uses some physics 
input. 

2.5. Trick 4 

The numerical difficulties are caused by the 
nonanalyticity of the sign function at zero. The 
idea here is two double the number of fields so 
as to ameliorate the singularity in the sign func- 
tion. In this way one may hope to avoid nested 
CG if one can replace the rational approximation 
by a polynomial of moderate degree. To see how 
this could possibly work, H, introduce the fields 
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Next, consider the 



following identity, easily proven by Gaussian in- 
tegration over 0, <f>: 
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The main point is that is less violently be- 
haved at the origin than l/\x\ and might be easier 
to reproduce either polynomially, or by a low n 
rational. In a dynamical simulation the det Hyy 
prefactor will need to be canceled by pseudofer- 
minos. The important point is that the induced 
action for the x/jip fields has the right structure. 

2.6. Trick 5 

The moral from trick 4 is that adding extra 
fields to induce the desired action for the fields 
'tjjip softens the singularity of e. Theoretically, we 
know that adding and infinite number of fields 
removes the singularity altogether. For an ap- 
proximation to the sign function characterized 
by order 2n one expects that the addition of 2n 
fields can remove all polynomials or rationals al- 
together. This brings the approximation closer 
to domain walls, but maintains a larger degree of 
flexibility. 

The trick I am describing below || rests on 
two observations: (1) Any rational approxima- 
tion can be viewed as a truncated continued frac- 
tion, which, when untruncated, would represent 
the sign function exactly (except exactly at the 
origin, where the sign function isn't defined) (2) 
Any (truncated) continued fraction can be ex- 
actly mapped into a (finite) chain model. Rather 
than presenting the idea in the abstract let us fo- 
cus on a chain realization of En\x). The general 
case will become obvious. 

First, the rational approximation has to be 
written in the form of a continued fraction with 
entries preferably linear in H\y- I start from a 
formula that goes as far back as Euler (see be- 
low), and subsequently use the invariance under 
inversion of x to move the x factors around, so 
that the entries become linear in x. 

e n (x) = 

2nx 

" (An 2 - l)x 2 
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Now, with the help of extra fields, I write a Gaus- 
sian path integral which induces the desired ac- 
tion between a chosen subset of fields: 



d(j)\d<f)id(j)2d<f)2 ■ ■ ■ d(j) n d(j) n e s * — 
(det H w ) 2n e-^* +e ^ Hw ™ 



The quadratic action S* couples the extended 
fcrmionic fields x,x : 
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S* = xHx, where the new kernel, H, has the 
following block structure: 

\ 







Hw y/Ot2n-l 
y/&2n-l —Hw ) 



The numerical coefficients a are given below: 

(2n-j)(2n + j) . 

an = in, otj = — r. 7 = l.Z. ... 

3 (2j-l)(2j + l)' 



The hope is that the condition number of H will 
be manageable. 

So, at the expense of adding extra fields one can 
avoid a nested conjugate gradient procedure when 
dynamical fermions arc simulated. The chain ver- 
sion of the direct truncation of the overlap Dirac 
operator is similar in appearance to domain walls. 
But, one is free to change both the rational ap- 
proximation and its chain implementation. 

Moreover, since here the argument of the ap- 
proximated sign function is Hw, not the rather 
cumbersome logarithm of the transfer matrix of 
the domain wall case, eigenstates of Hw with 
small eigenvalues can be eliminated by projection 
with greater ease [||. This elimination, although 
costly numerically, vastly increases the accuracy 



of the approximation to the sign function. Ac- 
tually, at this stage of the game and at practical 
gauge coupling values, the use of projectors seems 
to be numerically indispensable to direct imple- 
mentations of the QCD overlap Dirac operator. 
But, no projectors have been implemented in do- 
main wall simulations. However |^|, the domain 
wall version is too close to the overlap Dirac op- 
erator based on Hw to believe that projections 
are necessary in one case but can be ignored in 
the other. Thus, I urge caution when interpret- 
ing data obtained using very light domain wall 
fermions. Domain wall practitioners might con- 
sider implementing projectors to improve their 
reach to low quark masses. 

3. Final comments 

Practical tests of the above tricks are both 
badly needed and embarrassingly few at the mo- 
ment. There is not much to test in Trick 1. Trick 
2 has been tested - its usefulness is architecture 
dependent. Tricks 3 through 5 have not been 
tested yet. Still, I think it is important to share 
insights and maintain flexibility, so I decided to 
present these ideas at an early stage. 
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